A chromosome-level genome assembly of the jade perch (Scortum barcoo)

Endemic to Australia, jade perch (Scortum barcoo) is a highly profitable freshwater bass species. It has extraordinarily high levels of omega-3 polyunsaturated fatty acids (PUFAs), which detailed genes involved in are largely unclear. Meanwhile, there were four chromosome-level bass species have been previous sequenced, while the bass ancestor genome karyotypes have not been estimated. Therefore, we sequenced, assembled and annotated a genome of jade perch to characterize the detailed genes for biosynthesis of omega-3 PUFAs and to deduce the bass ancestor genome karyotypes. We constructed a chromosome-level genome assembly with 24 pairs of chromosomes, 657.7 Mb in total length, and the contig and the scaffold N50 of 4.8 Mb and 28.6 Mb respectively. We also identified repetitive elements (accounting for 19.7% of the genome assembly) and predicted 26,905 protein-coding genes. Meanwhile, we performed genome-wide localization and characterization of several important genes encoding some key enzymes in the biosynthesis pathway of PUFAs. These genes may contribute to the high concentration of omega-3 in jade perch. Moreover, we conducted a series of comparative genomic analyses among four representative bass species at a chromosome level, resulting in a series of sequences of a deductive bass ancestor genome.

Mammals have a series of fads1, fads2 and fads3, while most of teleosts have only one fads2 10,11 . The fads2 encodes an enzyme to desaturate fatty acids through a double bond among the defined carbons of the fatty acyl chain. There are several types of fads2 with different desaturated positions, including Δ4 desaturase, Δ5 desaturase, and Δ6 desaturase in various teleosts [12][13][14] . In addition, the function of Δ8 desaturase of fads2 was also reported in some studies [15][16][17] . Moreover, a total of seven members of the elovl gene family (elovl1-7) have been reported in mammals 18 . Among them, elovl1, elovl3, elovl6 and elovl7 encode enzymes to catalyze the elongation of saturated and mono-unsaturated fatty acids, while elovl2, elolv4 and elovl5 encode proteins for elongation of PUFAs [19][20][21] . The functions of fish elovl2, elovl4 and elovl5 are consistent with those mammalian counterparts 9,[22][23][24][25][26] . Recently, two elovl8 isotypes (elovl8a and elovl8b; not found in mammals) were first reported in rabbitfish (Siganus canaliculatus), and the elovl8b has been shown to play an important role in elongation of PUFAs 27 . Due to the high omega-3 PUFA content, the jade perch can be used as a good model for detailed exploration of the biosynthesis pathway of the omega-3 PUFAs.
Furthermore, reconstruction of the ancestral teleost karyotype and models of teleost genome evolution have been proposed previously based on teleost genomes of such as Tetraodon and medaka [28][29][30] . Medaka genome has been reported to have well conserved teleost ancestor karyotypes without major interchromosomal rearrangements 30 . However, even if genomes of four bass species have been sequenced at a chromosome level, the bass ancestral karyotypes were still not deduced.
Here, we applied an integrated strategy of Illumina, Nanopore and Hi-C (high-through-put chromosome conformation capture) sequencing technologies to yield high-quality and sufficient data for subsequent genome assembly, annotation and chromosome construction. Furthermore, we not only localized several key functional genes related to omega-3 PUFA biosynthesis onto chromosomes, but also predicted some potential mechanisms for the high omega-3 PUFA content in the jade perch flesh. We also established the phylogenetic position of jade perch with seven representative teleosts based on single-copy genes, and deduced the bass ancestral karyotypes with a common chromosome translocation in four examined bass species. This jade perch genome assembly, as an important genetic resource, will support in-depth bass biological studies and practical molecular breeding of this economically important fish.

Methods
Sample collection. A three-year-old jade perch individual ( Fig. 1), bred in a local fishery farm in Foshan City, Guangdong Province, China, was sampled for this study. We collected muscle, spleen, head kidney, body kidney and thymus of this individual for genome and transcriptome sequencing. All samples were freshly frozen in liquid nitrogen, and then stored at -80 °C until use. This sampling pipeline was approved by the Institutional Review Board on Bioethics and Biosafety of BGI-Shenzhen, China (No. FT18134).
DNA extraction and genome sequencing. Genomic DNA was extracted from the muscle tissue by using a Nucleic Acid Kit (Qiagen, Germantown, MD, USA) following the manufacturer's instructions. These DNA samples were used to construct libraries for subsequent sequencing on Illumina (Illumina Inc., San Diego, CA, USA) and Nanopore (Oxford Nanopore Technologies, Oxford, UK) platforms. In brief, three paired-end libraries with different insert sizes (270 bp, 500 bp and 800 bp) were constructed and then sequenced on an Illumina HiSeq X Ten platform. About 118.3 Gb of raw reads with paired-end 150 bp were generated. Approximately 109.9 Gb of clean reads were retained for further assembly after removal of low-quality or duplicated reads and adapter sequences by SOAPfilter v2.2 31 (parameters: -z -p -g 1 -M 2 -f 0). For Nanopore library, the gDNAs were size-selected (10-50 kb) with a Blue Pippin system (Sage Science, Beverly, MA, USA) and processed using the Ligation Sequencing 1D kit (SQKLSK109, Oxford Nanopore Technologies) according to the manufacturer's instructions. About 82.6 Gb of raw data with fast5 format were collected. After removal of low-quality reads (mean_qscore < 7), 40.0-Gb reads with a mean length of 21.2 kb passed the quality control. Correction of Nanopore reads was then performed by mapping the Illumina clean reads to the Nanopore sequence data using the LoRDEC 32 program with default parameters. RNA extraction and transcriptome sequencing. Muscle, spleen, head kidney, body kidney and thymus tissues of the jade perch were homogenized in the TRIZOL Reagent (Invitrogen, Carlsbad, CA, USA). Library construction and transcriptome sequencing were performed on an Illumina HiSeq X Ten platform in accordance with the manufacturer's protocols. A total of 20.0 Gb data (about 4.0 Gb for each tissue) were generated for transcript and genome annotation. www.nature.com/scientificdata www.nature.com/scientificdata/ Estimation of the genome size. We estimated the genome size of jade perch by using the routine 17-mer frequency distribution analysis 33,34 . Cleaned Illumina sequences with the insert size of 800 bp were used as the input file. The genome size was calculated according to the following equation: genome size = k-mer number/the estimated k-mer depth. In the case of sufficient data, the k-mer frequency distribution follows a normal Poisson distribution. The peak of the practical k-mer distribution curve is considered as the expected k-mer depth. The estimated genome size of jade perch is about 692.7 Mb, with 5.8 × 10 10 k-mers and a peak k-mer depth of 84.
De novo genome assembly and chromosome construction. For the de novo genome assembly, we adopted a hybrid strategy of combining both the clean Illumina and Nanopore reads. Firstly, we employed Platanus v1.2.4 35 (parameters: -k 29 -d 0.3 -t 16 -m 450) to generate initial contigs based on the clean Illumina reads. Subsequently, these contigs and error-corrected Nanopore long reads were both used as input files for the DBG2OLC 36 assembly pipeline (parameters: LD 1 k 17 KmerCovTh 6 MinOverlap 60 AdaptiveTh 0.012). Finally, a high-quality genome of the jade perch was assembled. The scaffold-level genome of 657.7 Mb was firstly assembled, accounting for 94.9% of the estimated genome size. We also evaluated the completeness of this assembly by using the BUSCO v5.2.2 assessment 37 . The BUSCO evaluation calculated the genome module benchmark value to be C: 97.9%, including S: 95.6%, D: 2.3%, F: 0.1%, M: 2.0%, and n = 3,640 (C: complete, S: single-copy, D: duplicated, F: fragmental, M: missed, and n: total BUSCO groups of Actinopterygii_ odb9 data set). A total of 3,565 out of the 3,640 (97.9%) from Actinopterygii gene set had been identified completely in the assembled genome of jade perch ( Table 1).
The Hi-C technique was applied to construct a chromosome-level genome of jade perch. In brief, muscle sample was fixed by formaldehyde. The restriction enzyme (Mbo I) was used for digestion of extracted genomic DNAs, followed by repairing of the 5′ overhangs with a biotinylated residue. A paired-end library with an insert size of approximately 300 bp was constructed. About 48.0 Gb of Hi-C reads were generated by sequencing on the HiSeq X ten platform. These Hi-C reads were mapped onto the assembled scaffolds by using Bowtie2 38 (parameters: --very-sensitive -L 30 --score-min L,-0.6,-0.2 --end-to-end -reorder). We then obtained all valid pair information with scaffold linkage from the results of HiC-Pro v2.8.0 39 with default parameters. Juicer v1.5 40 (parameter: chr_num 24) and 3d-DNA v170123 41 (parameters: -m haploid -r 2) were applied to anchor these scaffolds into chromosomes. We also utilized Juicebox v1.11.08 42 to fix error-joins and removed duplicated contigs. The 3D-DNA pipeline employed the correction file from Juicebox to generate the final chromosome-level genome assembly. By using these Hi-C data, we constructed 24 chromosomes with a total length of 642.9 Mb, accounting for 97.7% of the scaffold-level genome (Fig. 2). The Contig and scaffold N50 values of this final chromosome-level genome assembly are 4.5 Mb and 28.6 Mb, respectively.
Repeat annotation. We employed two approaches, including ab initio prediction and homology annotation, to detect repetitive elements in the assembled genome of jade perch. For the ab initio prediction, we applied RepeatModeler v1.0.8 43 45 with default parameters were employed to search repeat elements in the genome assembly based on the repeat library from the ab initio prediction. We finally predicted about 129.5 Mb of repetitive sequences, accounting for 19.7% of the assembled jade perch genome. This ratio is at the middle level between European seabass (21.5%) 48 and large yellow croaker (18.1%) 49 , both of which belong to Percomorphaceae.
We predicted a total of 26,905 protein-coding genes (close to the 26,719 of European seabass; Table 2) with a mean length of 15,168.9 bp. Each gene has an average number of 9.9 exons, with a mean length of 179.6 bp. About 97.9% (26,327 genes) of the total predicted genes were assigned with at least function annotation (Table 3). ortholog and phylogenetic analyses. Reference protein sequences of eight representative species, including Asian arowana (Scleropages formosus), zebrafish, medaka, Asian seabass (Lates calcarifer), threespine stickleback, European seabass, American black bass (Micropterus salmoides), and Chinese seabass (Lateolabrax maculatus), were downloaded from NCBI database (release 75). These protein sets and our jade perch protein set were filtered by removing those protein sequences with less than 50 amino acids. All to all blast was performed by BLASTP with an e-value ≤ 10 -5 to identify homologous sequences 51 . These protein sets were then clustered into gene families by using OrthoMCL (v2.0.9) 59 with default parameters.
To identify the phylogenetic position of jade perch, we employed MUSCLE v3.8.31 60 to align single-copy orthologous genes from the used nine species. Subsequently, these protein sequences were converted to their www.nature.com/scientificdata www.nature.com/scientificdata/ according coding sequences. Alignments were concatenated to form a single supergene for each species. The alignments of supergenes were then used to construct a Maximum Likelihood (ML) tree by PhyML v2.4.4 61 (parameters: -rates gamma -a e -c 4 -t e). MCMCtree 62 (parameters: -model HKY85, -clock independent rates, -seed -1) in the PAML package v4.9 was employed to calculate the divergence times among the selected nine fish species.
A phylogenetic tree was constructed (Fig. 3), and it demonstrated that jade perch and European seabass were clustered into one clade with a bootstrap value of 100. The close relationship between these two bass species is consistent with their traditional morphological classification. Their divergence time was estimated to be 71.7 million years ago (see Fig. 3).

Reconstruction of ancestral chromosomes from four representative bass species. The ancestral
teleost karyotype was estimated to contain 13 pairs of potential chromosomes that were marked as Ancestor Chromosomes a~m in previous studies 29,63 . Each protein set of four representative bass species, including jade perch, Chinese seabass, American black bass and European seabass 64 , and stickleback was aligned to that of the predicted ancestor chromosomes respectively by using BLASTP (e-value < 1 × 10 −10 ). We then identified the reciprocal best-hit alignments between each of the four bass species and the ancestor chromosomes. Finally,   Table 3. Function annotation of genes by multiple methods. www.nature.com/scientificdata www.nature.com/scientificdata/ chromosome fissions, fusions and translocations were identified and demonstrated by using SVG in Perl (Fig. 4). We determined that jade perch, European seabass, American black bass and Chinese seabass have well conserved most of the ancestral chromosome karyotype (Fig. 4). A common chromosome translocation appeared in their chromosomes, when compared to the ancestral chromosomes. As to our jade perch genome assembly has more complete sequences than the reported genomes of other bass species (Table 4), it could be the best reference genome assembly for studying the Perciformes evolution.
We identified the core genes for omega-3 PUFA biosynthesis in the jade perch genome (Fig. 5a). As reported in the majority of teleosts 11 , elovl2 was absent while one copy of fads2 and one elovl5 were present in jade perch (chr 6 and chr 14, respectively). Two elovl4 genes (elovl4a and elovl4b) were localized in the chr16 and chr13, respectively (Fig. 5a). This number, however, is different from most of the reported fishes, which usually have only one copy of elovl4 gene 11,16,25,[65][66][67][68] . In addition, the elovl4a of jade perch shared a higher identity (82.5%) with its ortholog in African catfish (Clarias gariepinus; Fig. 5b), than that in zebrafish (79.7%). This implies that jade perch, similar to African catfish 69 , may have an efficient ability to synthesize Omgea-3 PUFAs for their accumulation in the flesh.
Recently, a novel elongase (termed as elovl8) was reported in rabbitfish with two isotypes (elovl8a and elovl8b), and functional experiments revealed that elovl8b, but not elovl8a, has a capacity to elongate C18 and   www.nature.com/scientificdata www.nature.com/scientificdata/ C20 PUFA precursors to produce a longer PUFA 27 . Similarly, we identified both elovl8a and elovl8b genes in the jade perch genome (chr 3 and chr 8, respectively; see Fig. 5a). At least one of both elongases may participate in the high PUFA accumulation in jade perch flesh.
Previous studies reported that fads2, elovl2, elovl5, and elovl8b were primarily distributed in liver, while elovl4a, elovl4b and elovl8a were mainly present in brain, eye and gonad 9,25,27,66,69,70 . We therefore speculate that these gene families of both fads and elovl may elevate the biosynthesis content of PUFAs in jade perch, resulting in abundant omega-3 PUFAs in at least its flesh.
Furthermore, these PUFA biosynthesis related genes are individually located on different chromosomes (Fig. 5a). This scattered pattern suggests that these gene copies could not be generated by tandem duplication, instead more possibly by fish-specific genome duplication.

Data Records
The genome assembly and annotation files were deposited in China National GeneBank (CNGB) under the accession number CNP0002889 71 and NCBI under the accession number SRP370737 72 . Raw reads of genome and transcriptome generated in the present study were deposited in the NCBI under the accession number SRP370737 72 .

technical Validation
The agarose gel electrophoresis was used to check the quality of extracted DNA molecules. The main band is around 20 kb, and the DNA spectrophotometer ratios (260/280) were over 1.8. The quality of purified RNA molecules was examined by Nanodrop ND-1000 spectrophotometer (LabTech, Corinth, MS, USA) as the absorbance > 1.7 at 260 nm/280 nm and 2100 Bioanalyzer (Agilent Technologies, Santa Clara, CA, USA) as the RIN of 8.0. We further evaluated the completeness of the jade perch genome assembly by using BUSCO v5.2.2, and determined that 97.9% of BUSCO genes were complete.